Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs
نویسنده
چکیده
This paper considers the problem of computing an optimal policy for a Markov Decision Process (MDP), under lack of complete a priori knowledge of (i) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (ii) the probability distributions characterizing the immediate rewards returned by the environment as a result of the execution of these actions at different states of the process. In addition, it is assumed that the underlying process evolves in a repetitive, episodic manner, with each episode starting from a well-defined initial state and evolving over an acyclic state space. A novel efficient algorithm for this problem is proposed, and its convergence properties and computational complexity are rigorously characterized in the formal framework of computational learning theory. Furthermore, in the process of deriving the aforementioned results, the presented work generalizes Bechhofer’s “indifference-zone” approach for the Ranking & Selection problem, that arises in statistical inference theory, so that it applies to populations with bounded general distributions.
منابع مشابه
Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces
This paper considers the problem of computing an optimal policy for a Markov Decision Process (MDP), under lack of complete a priori knowledge of (i) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (ii) the probability distributions characterizing the immediate rewards returned by the environment as a result...
متن کاملEfficient schedules for the problem of optimal node visitation in acyclic stochastic digraphs
Given a stochastic, acyclic, connected digraph with a single source node and a control agent that repetitively traverses this graph, each time starting from the source node, we want to define a control policy that will enable this agent to visit each of the graph terminal nodes a prespecified number of times, while minimizing the expected number of the graph traversals. We formulate this proble...
متن کاملArrival probability in the stochastic networks with an established discrete time Markov chain
The probable lack of some arcs and nodes in the stochastic networks is considered in this paper, and its effect is shown as the arrival probability from a given source node to a given sink node. A discrete time Markov chain with an absorbing state is established in a directed acyclic network. Then, the probability of transition from the initial state to the absorbing state is computed. It is as...
متن کاملLongest Path in Networks of Queues in the Steady-State
Due to the importance of longest path analysis in networks of queues, we develop an analytical method for computing the steady-state distribution function of longest path in acyclic networks of queues. We assume the network consists of a number of queuing systems and each one has either one or infinite servers. The distribution function of service time is assumed to be exponential or Erlang. Fu...
متن کاملLearning Evaluation Functions for Large Acyclic Domains
Some of the most successful recent applications of reinforcement learning have used neural networks and the TD( ) algorithm to learn evaluation functions. In this paper, we examine the intuition that TD( ) operates by approximating asynchronous value iteration. We note that on the important subclass of acyclic tasks, value iteration is ine cient compared with another graph algorithm, DAG-SP, wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Discrete Event Dynamic Systems
دوره 17 شماره
صفحات -
تاریخ انتشار 2007